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ABSTRACT 


The  effectiveness  of  several  iterative  techniques  for  solv- 
ing matrix  equations  resulting  from  finite  difference  approxima- 
tions to  self-adjoint  parabolic  and  elliptic  partial  differential 
equations  is  reviewed.   The  techniques  are: 

1.  SIP-  Strongly  Implicit  Procedure 

2.  ICCG-  Incomplete  Cholesky  Conjugate  Gradient  Method 

3.  VICCG-  Vectorized  ICCG 

4.  MICCG-  Modified  ICCG 

5.  DSCG-   Diagonally  Scaled  Conjugate  Gradient  Method 

6.  POLCG-  Polynomial  Preconditioned  Conjugate  Gradient 

Method. 

7.  PICCG-  Polynomial  form  of  ICCG 

The  comparison  is  made  on  a  vector  machine  (two-pipe  Cyber 
205)  where  vectorization  of  the  code  is  done  primarily  by  the 
vector  compiler  available.  It  is  found  that  of  the  methods 
studied,  POLCG  and  MICCG  appear  to  require  the  least  amount  of 
CPU  time.  An  advantage  of  MICCG  over  POLCG  is  that  it  is  less 
sensitive  to  increasing  matrix  size.  Its  disadvantages  are  that 
it  requires  an  iteration  parameter,  has  a  greater  set-up  time, 
and  needs  more  storage  than  POLCG. 


1.  Introduction 

In  the  numerical  solution  of  two  and  three  dimensional 
partial  differential  equations  one  is  often  led  to  a  system  of 
linear  equations  (or  a  matrix  equation)  which  must  be  solved. 
Efficient  performance  of  this  task  by  the  researcher  has  become 
complicated  with  the  advent  of  vector  and  parallel  processing 
machines.  Depending  upon  the  algorithm,  machine,  and  problem 
to  be  solved,  an  "optimum"  strategy  taken  by  the  researcher 
should  take  into  consideration  properties  of  various  algorithms 
such  as   their  vectorizability   and  parallelizability . 

The  present  work  is  an  effort  to  study  the  efficiency  of 
several  iterative  methods  used  in  the  solution  of  symmetric 
banded  systems  of  linear  equations.  They  are:  Stone' s[l]  Strong- 
ly Implicit  Procedure (SIP) ;  the  Incomplete  Cholesky  Conjugate 
Gradient  Method  (ICCG)  of  Meijerink  and  van  der  Vorst[2],  includ- 
ing its  vectorized  variant  (VICCG(O) ) [3] ,  and  Gustaf sson ' s[4 ] 
modified  form  (MICCG) ;  a  diagonally  scaled  conjugate  gradient 
method  (DSCG) ;  Saad's[5]  polynomial  preconditioned  conjugate 
gradient  method  (POLCG) ;  and  finally  a  polynomial  variation  of 
ICCG  (PICCG) .  The  machine  on  which  comparisons  are  made  is  a 
Cyber  205  which  has  a  peak  operating  performance  of  100  mega- 
flops for  vector  addition  and/or  multiplication  (200  megaflops 
for  linked  triadic  vector  operations) . 

New  algorithms  are  not  introduced  (although  some  modifica- 
tions to  existing  methods  are  tried) .  The  purpose  of  this  effort 
is  to  take  a  series  of  known  methods,  apply  them  to  problems  with 


similar  characteristics  to  what  might  be  found  in  real  physical 
settings  (including  three  dimensional  problems  of  groundwater 
modelling),  and  determine  which  method(s)  reviewed  have  superior 
performance  on  a  Cyber  205. 

Little  has  been  written  in  the  water  resource  journals  on 
many  of  the  now  available  vectorizable  iterative  techniques  for 
solving  large  sparse  matrices.  Trescott  and  Larson [10]  studied 
the  convergence  properties  of  SIP  in  comparison  to  ADI (Alternat- 
ing Direction  Implicit)  and  LS0R(Line  Successive  Overrelaxation) . 
In  1981  Kuiper[ll]  compared  SIP  to  ICCG  concluding  that  for  con- 
fined aquifer  problems  ICCG  appeared  to  be  superior  while  for 
water  table  conditions (nonlinear)  the  methods  performed  compar- 
ably well.  More  recently  Kuiper[12]  compared  a  number  of  iterative 
techniques  on  a  set  of  linear  and  nonlinear  groundwater  problems. 
The  conclusion  of  his  most  recent  study  was  that  the  Picard- 
preconditioned  conjugate  gradient  methods  performed  best  for 
both  the  linear  and  nonlinear  problems.  Each  of  these  papers  are 
concerned  with  performance  on  scalar  machines. 

In  the  open  literature  there  are  several  comparisons  of 
iterative  methods  which  take  into  account  vectorizability  of 
particular  algorithms,  and  their  efficiency  on  given  "supercom- 
puters". A  few  of  the  more  pertinent  references  are  given  below. 

Meurant[6]  and  Jordan [7]  have  compared  VTCCG,  POLCG,  and 
block  preconditioned  conjugate  gradient  methods  for  two  dimens- 
ional problems.  Meurant's  study  concluded  that  for  "small"  prob- 
lems (grids  of  60x60)  POLCG  seemed  optimal  on  the  2-pipe  Cyber 


205  while  for  larger  problems  the  block  preconditioning  methods 
appeared  to  do  better.  Jordan  preferred  polynomial  precondition- 
ed from  the  standpoint  of  parallel  processing. 

Hayami  and  Harada[8]  have  recently  estimated  that  on  the  NEC 
SX-2  supercomputer  (assuming  an  acceleration  due  to  vector  pro- 
cessing of  40  times  the  scalar  rate)  DSCG  will  run  ten  times 
faster  than  ICCG  for  most  problems.  As  the  Cyber  205  is  also  a 
highly  vectorizable  machine  DSCG  has  been  included  as  one  of  the 
algorithms  to  be  tested. 

Gustaf sson [ 4 ]  considered  a  class  of  first  order  methods 
(SSOR  and  MICCG(n))  which  were  shown  to  have  markedly   better 
convergence  properties  than  ICCG.  Extending  Gustaf sson' s  MICCG, 
Ashcroft  and  Grimes [9]  applied  the  algorithm  to  three  dimensional 
problems  and  in  particular  programmed  it  in  such  a  way  that  (at 
least  on  a  Cray  machine)  much  of  the  code  can  be  made  to  run  con- 
currently. Their  analysis  compares  MICCG,  SSOR,  DSCG  and  Nofill 
Red/black   Incomplete  factorization  preconditioners  concluding 
that  the  "concurrent"  MICCG  method,  while  not  as  vectorizable  as 
some  of  the  other  methods,  is  superior  due  to  its  relatively  low 
iteration  count. 

The  present  work  differs  from  that  previously  reported  in 
the  following  respects.  The  machine  used  is  exclusively  a  two- 
pipe  Cyber  205,  both  two  and  three  dimensional  problems  are  run, 
two  of  the  five  test  problems  have  anisotropic  conditions,  and 
finally,  the  set  of  iterative  techniques  tested  is  larger  than 
previously  reported. 


The  matrix  equations  used  in  the  present  comparison  arise 
from  finite  difference  approximations  to  the  2D  and  3D  partial 
differential  equations  governing  the  flow  of  groundwater  in  con- 
fined aquifers.  These  equations  involve  self  adjoint  parabolic 
(or  elliptic  for  the  case  of  steady  state  flow)  differential 
operators  for  which  all  of  the  iterative  schemes  tested  are  ap- 
plicable. Equations  of  a  similar  type  result  in  various  transport 
phenomenon  such  as  heat  conduction  and  laser  fusion. 

The   paper  is  outlined  as  follows.  In  section  2,  the  five 
model  problems  are  described.  This  is  followed  in  section  3  by  a 
short  explanation  of  the  algorithms.  Section  4  concludes  with 
comparitive  results  of  each  on  the  set  of  test  problems. 


2.  Formulation  of  the  Problems 

The  derivation  of  the  three  dimensional  partial  differential 
equation  governing  the  distribution  of  hydraulic  head  in  a  con- 
fined aquifer  can  be  found  in  Reddell  and  Sunada[13].  Assuming 
components  of  the  transmissivity  tensor  lie  along  coordinate  axes 
the  equation  can  be  concisely  written  as: 

(1)  Sht  =  (Kxhx)x  +(Kyhy)y  +(Kzhz)z  +  R 

where  h  is  the  pressure  head  to  be  determined;  S  is  the  storage 
coefficient  of  the  porous  media;  Kx,  K^,  and  Kz  are  transmis- 
sivities  of  the  aquifer  in  the  x,y,  and  z  directions;  and  R  is 
the  volumetric  flux  of  recharge  or  withdrawal  per  unit  volume  of 
water  from  the  aquifer.  Sources  and  sinks  are  approximated  by 
delta  functions  with  strengths  equal  their  volumetric  flux. 

The  primary  boundary  conditions  are  Dirichlet  and  homogen- 
eous Neumann,  which  correspond  physically  to  constant  head  and 
no-flux  boundaries  respectively. 

In  each  of  the  test  problems,  the  aquifer  is  contained  in 
a  rectanqular  solid  whose  faces  are  assigned  appropriate  bound- 
ary conditions.  For  aquifers  of  a  more  general  shape,  points 
exterior  to  the  aquifer  boundaries  are  assigned  zero  trans- 
missivities  for  the  purpose  of  eliminating  their  role  in  the 
determination  of  head  values  interior  to  the  aquifer  (see  for 
example  [14]).  This  is  done  in  problem  five  where  an  "L"  shaped 
region   is  considered. 

In   the   first  four  problems,  a  steady  state   solution   is 


sought.  Of  these  the  fourth  has  a  nonunique  solution  since  all 
of  its  boundary  conditions  are  homogeneous  Neumann.  The  initial 
guess  for  h  in  each  of  the  first  four  problems  is  zero  except 
at  points  where  sources/sinks  are  present  or  non-homogeneous 
Dirichlet  boundary  conditions  apply.  The  fifth  test  problem 
considers  a  fully  time  dependent  problem,  but  solves  for  only 
one  time  step,  given  values  of  the  head  everywhere  for  the 
immediately  preceding  time  level: 

(2)  h(x,y,z)=102(x~y+1) . 

problem  1: 

The  first  problem  models  an  isotropic  homogeneous  aquifer 
with  constant  head  boundaries.  No  sources  or  sinks  are  present. 
The  transmissivities  and  boundary  conditions  are: 

Kx=Ky=Kz=l . 0   and 
(3) 

h=l  on  all  boundaries. 

problem  2: 

The  second  test  problem  examines  the  effect  of  using  trans- 
missivities with  linear  gradients  along  each  of  the  coordinate 
axes.  As  before,  no  sources  or  sinks  are  present.  Specification 
of  case  two  is  as  follows: 

Kx=l-16y/17 
Ky=(16x+1)/17 
Kz=(96z+1)/10 


(4) 


hn  =  0   at   x  =  0   and   z  =  l 


hn=l  at  x=l 


h=10  at  y=0,  and 
h=l  at  y=l,z=0. 
problem  3 : 

Refer  to  figure  1  in  the  description  of  the  third  test  prob- 
lem. There  are  four  subsections  of  the  aquifer  which  are  isotrop- 
ic and  homogeneous,  but  whose  transmissivities  lie  in  the  range 
zero  to  forty.  A  feature  of  this  problem  which  makes  it  difficult 
is  that  one  of  the  subsections  is  an  impermeable  layer  with  a 
relatively  small  transmitting  aperture.  A  source  of  strength  1.0 
is  positioned  at  (. 875, . 9375, . 1875) ,  and  a  sink  of  strength  .25 
is  located  at  ( . 4375, . 625, . 5) .  Specification  of  case  three  is: 

KX=KV=KZ=1.0  in  region  A 

Kx=KY=Kz=0.0  in  B 

KX=KY=KZ=2  0.0  in  C 

Kx=Ky=Kz=40 .0   in   D 
(5) 

hn=0.0  at  y=0,z=l 

hn=-1.0  at  x=l 
h=l  at  y=l,  z=0,  and 
h=5  at  x=0. 
The  center  point  of  the  impermeable  shell  (region  B)  is  at 
(.5, .4375, .5)  with  side  length  .75  and  width  .125.  The  center- 
point  for  region  C  is  the  same  as  for  B  but  has  a  side  length 
of  .375.  The  center  for  region  D  is  (.5,. 9,. 5)  with  side  length 
.125,  and  finally  the  aperture  is  centered  at  (.75,. 5,. 5)  with 


side  length  in  the  y  and  z  directions  of  .25,  and  width  .125. 

problem  4: 

Test  case  four  is  a  two  dimensional  example  taken  from 

Stones ■ s  paper [1].  Two  dimensional  problems  are  solved  by  the 

three  dimensional  code  by  keeping  boundary  conditions  and  trans- 

missivities  independent  of  the  z  coordinate.  See  figure  2  for  an 

illustration  of  the  domain  modelled.  Specifications  are: 

Kz=l  everywhere 

Kx=k^=1   in  region  A 
(6) 

Kx=l,  KY=100  in  B 

Kx=100,  k¥=1  in  C 
Kx=K^=Kz=0  in  D,  and 
hn=0  on  all  boundaries. 
Line  sources  for  problem  four  were  located  at  (x,y)  equal  to 
( . 1, . 1) , ( . 1, .9) ,  (.767, .133)  with  strengths  1.0, .5  and  .6  re- 
spectively; while  sinks  were  located  at  (.4 67,. 5)  and  (.9,. 9) 
with  strengths  1.8  3  and  .27.  They  are  represented  in  figure  2 
as  (  +  )  '  s  and  (-)  •  s. 
problem  5: 

Problem  five  can  be  found  in  Kershaw's  paper [15].  It  is 
two  dimensional  and  is  without  sources  or  sinks.  In  figure  3, 
D  or  N  at  a  boundary  signifies  homogeneous  Dirichlet  or  homo- 
geneous Neumann  conditions  respectively.  Equation  (2)  gives  the 
initial  condition  for  this  problem,  and  the  transmissivities  are: 

KX=KY=KZ=10"4  in  region  A 
KX=KV=KZ=10"2   in   B 
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(7) 

Kx=Ry=Kz=10  in  C,  and 

Kx=Ky=Kz=106  in  D. 

3.  Summary  of  the  Numerical  Methods 

Equation  (1)  is  approximated  by  a  node-centered  finite  dif- 
ference formulation.  The  truncation  error  is  second  order  in  the 
spatial  increments  if  a  uniform  mesh  is  used.  The  accuracy  drops 
to  first  order  if  the  mesh  is  nonuniform.  The  boundary  conditions 
are  handled  in  such  a  way  that  the  second  order  truncation  error 
is  maintained  regardless  of  the  spacing. 

By  employing  the  "natural"  ordering  of  the  grid  points  a 
seven  banded  symmetric  matrix  equation  results, 

( 8 )  Ah=b 

where  A  is  positive  semi-definite,  b  represents  known  boundary 
conditions  and  possible  sources  or  sinks,  and  h  is  the  vector  of 
unknown  pressure  head  values  to  be  solved  for. 

Solving  for  h  involves  iterating  on  an  initial  guess  to  a 
final  solution  which  in  some  sense  must  be  close  to  the  exact 
answer.  Defining  the  mth  iterate  as  h(m) ,  an  error  vector 
associated  with  the  mtn  iterate  is  introduced: 

(9)  em=b-Ah(m) 

The  inf-norm  of  e(m)  scaled  to  e(°)  defines  the  mtn  residual 
used  in  the  results  section  of  this  paper.  The  iterative  proc- 
ess is  halted  when  the  absolute  magnitude  of  the  scaled  residual 

•  —  ft 

is  less  than  a  tolerance  level  of  10  °. 

The  iterative  algorithms  are  outlined  below.  For  a  complete 


11 


description   of  any  particular  method  the  reader  is   referred   to 
the  original  works  from  which  the  methods  have  been  extracted. 
SIP: 

The  original  version  of  Stone 's[l]  SIP  was  extended  to  three 
dimensional  problems  by  Weinstein  et  al[16].  The  method  splits 
up  the  matrix  A(eq.  8)  into  an  approximate  LU  factorization,  where 
L  and  U  are  upper  and  lower  triangular  matrices.  The  product 
of  these  two  matrices,  however,  yields  thirteen  nonzero  diagonals 
rather  than  seven  which  were  originally  in  A.  Two  term  Taylor 
series  corrections  (weighted  by  an  iteration  parameter)  are 
"lumped"  into  the  original  LU  factors  producing  a  revised  fact- 
orization which  minimizes  somewhat  the  effects  of  the  concom- 
mitant  diagonals. 

Values  for  the  iteration  parameters  range  between  zero  and 
one,  with  values  close  to  one  being  near  optimal.  Careful  sel- 
ection of  this  parameter  is  critical.  Too  large  a  value  could 
cause  divergence  while  too  small  a  value  may  lead  to  very  slow 
convergence.  A  cycle  of  3  parameters  was  selected  (see  eq  10) 
where  amax  is  found  by  employing  the  formula  given  in  Wein- 
stein et  al[16]  (using  an  average  value,  rather  than  a  maximum 
value  over  the  entire  domain) . 

(10)  l-at=(l-amax)t/3/  for  t=l,2,3. 

A  refinement  of  SIP  involves  renumbering  the  nodes  in  pro- 
ducing the  matrix  equation  (8) .  Weinstein  et  al  use  two  out  of  a 
a  possible  four  distinct  numberings  that  can  be  made  by  re- 
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versing  the  order  of  the  nodes  along  one  or  two  of  the  coord- 
inate axes.  Including  a  second  renumbering  of  the  A  matrix  and 
performing  two  approximate  inversions  of  A  for  each  SIP  iteration 
was  found  to  increase  the  total  computation  time  reguired  to  con- 
verge to  a  solution.  In  addition,  if  (as  is  done  here)  the  L  and 
U  matrices  are  stored  for  each  renumbering  of  the  A  matrix  (as 
well  as  for  each  a^.)  rather  than  recalculated  at  each  SIP 
iteration,  the  amount  of  storage  reguired  can  become  guite  large. 
For  these  reasons  only  the  natural  ordering  of  the  nodes  is  used. 
Conjugate  Gradient  Methods; 

With  the  recent  introduction  of  incomplete  Cholesky  pre- 
conditioners[2 ]  to  the  basic  conjugate  gradient  algorithm [ 17 , 
18,19],  there  has  been  a  renewal  of  interest  in  this  algorithm 
and  an  ongoing  search  for  "optimal"  preconditioners [6, 7 , 8 , 9 ] . 

Conjugate  gradient  methods  differ  from  SIP  in  that  they 
are  based  upon  the  assumption  that  A  is  a  symmetric  matrix. 
SIP  has  no  such  restriction.  Variations  of  preconditioned 
conjugate  gradient  methods  do  exist  for  nonsymmetric  matrices 
[20,21]  but  the  theory  and  application  is  significantly  dif- 
ferent than  for  the  symmetric  case. 

The  incomplete  Cholesky  conjugate  gradient  method  of 
Meijerink  and  van  der  Vorst[2],  involves  a  similar  factorization 
of  the  A  matrix  as  found  in  SIP.  It  is  given  by: 
(11)  A~LDLT, 

where  D  is  a  diagonal  matrix,  and  L  is  a  lower  triangular  matrix 
whose  nonzero  diagonals  match  positions  of  those  in  the  A  matrix. 
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The  product  of  LU  from  SIP  and  LDLT  from  ICCG  are  ident- 
ical provided  the  iteration  parameter  used  in  SIP  is  set  to  zero. 
The  ICCG  factorization  neglects  the  six  additional  diagonals 
which  result  in  the  product  of  LDLT.  Alternative  factoriza- 
tions include  extra  diagonals  in  the  L  matrix  in  an  effort  to 
give  a  more  accurate  Cholesky  decomposition,  but  these  are  not 
considered  here.  Such  variants  tend  to  increase  the  storage 
requirements  of  the  method  and  the  amount  of  recursive  work  in 
each  iteration.  This  was  observed  by  Kershaw [15]. 

Gustaf sson[4 ]  improved  on  ICCG  by  incorporating  Stone's  idea 
of  minimizing  the  effect  of  concommitant  diagonals  produced  in 
the  LDLT  factorization.  Unlike  Stone,  who  uses  a  two  term  Taylor 
expansion  of  the  unwanted  terms  of  the  factorization,  Gustaf sson 
employs  a  one  term  Taylor  expansion.  This  guarantees  that  a  sym- 
metric factorization  can  be  found  and  that  the  conjugate  grad- 
ient algorithm  remains  applicable.  Gustaf sson' s  method  is  refer- 
red to  as  MICCG. 

Just  as  in  Stone's  method  an  iteration  parameter  is  used 
which  ranges  in  value  between  zero  and  one.  MICCG  was  found  to 
be  less  sensitive  to  the  actual  value  of  the  iteration  parameter 
than  is  SIP.  For  each  of  the  five  problems  tested,  a  trial  and 
error  system  was  used  to  find  optimal  values  (the  magnitudes 
ranged  between  .95  and  1.0). 

The  advent  of  vector  processing  machines  has  resulted  in  ef- 
forts to  modify  the  inherent  recursion  involved  in  inverting  the 
LDLT  approximation  to  the  A  matrix.  This  was  partially  ac- 
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complished  by  van  der  Vorst[3]  who  split  the  lower  triangular 
matrix  L  into  diagonal  blocks  which  could  then  be  individually 
inverted  via  truncated  Neumann  series. 

Van  der  Vorst's  method  of  vectorizing  the  recursive  part  of 
the  iteration  loop  has  been  employed  in  "vectorizing"  MICCG.  The 
resultant  algorithm  is  referred  to  in  this  paper  as  VICCG  and 
should  not  be  confused  with  van  der  Vorst's  vectorized  ICCG.  As 
with  MICCG  an  iteration  parameter  must  be  provided  and  in  general 
is  found  to  have  a  magnitude  less  than  the  optimal  value  found 
for  MICCG.  It  is  also  found  by  trial  and  error. 

One  final  variant  of  the  MICCG  algorithm  involves  using  a 
three  term  truncated  Neumann  series  approximation  for  invert- 
ing the  lower  triangular  matrix  L  in  the  LDLT  factorization. 
While  this  variation  eliminates  recursion  altogether  in  the 
iteration  loop,  it  was  found  that  for  the  fastest  convergence 
the  "optimal"  iteration  parameter  was  usually  zero.  The  re- 
sulting algorithm  therefore  is  actually  a  polynomial  form  of  the 
unmodified  original  ICCG  method.  It  is  here  referred  to  as  PICCG. 

An  alternative  to  factoring  the  A  matrix  to  obtain  a  pre- 
conditioner,  involves  approximating  A's  inverse  by  a  matrix 
polynomial [22 ] .  Recently,  Dubois  et  al[23]  have  reported  success 
in  using  a  truncated  Neumann  series  expansion  in  A  for  an  approx- 
imation to  A's  inverse.  Johnson  et  al[24]  have  proposed  a  poly- 
nomial preconditioner  whose  coefficients  are  determined  in  an 
effort  to  reduce  the  condition  number  of  the  product  of  A  and 
its  preconditioner.   Minimizing  this  number  is  important  since 
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theoretically  the  rate  of  convergence  of  the  conjugate  gradient 
algorithm  increases  as  the  magnitude  of  the  condition  number  is 
reduced. 

Saad's[5]  method  is  a  variant  of  the  method  of  Johnson  et 
al.  It  attempts  to  minimize 

^max 

(12)  J  (l-s(AO)2w(/x)d/i 

over  all  polynomials  s(/x)  less  than  or  equal  to  a  specified 
degree,  and  where   Mmax  ^s  the  largest  eigenvalue  of  the  matrix 
A.  In  the  present  work  a  polynomial  of  third  degree  is  sought, 
with  weight  factor: 

(13)  w(/x)  =  (Mmax-iLt)~^~3>2- 

Saad  originally  proposed  finding  Mmax  ky  employing  Gershgorin 
circles[25].  In  the  present  work  Mmax  -*-s  fixe^  at  2  which  is  near 
the  value  found  by  employing  Gershgorin' s  theorem  to  matrices 
(of  the  type  considered  here)  which  have  been  diagonally  scaled. 
In  almost  every  instance  the  choice  of  2  as  an  upper  bound  over 
that  found  via  Gershgorin' s  theorem  produced  faster  convergence. 
This  algorithm  is  referred  to  as  POLCG. 

If  instead  of  a  polynomial  of  order  three  one  of  zeroth 
order  is  selected,  and  the  A  matrix  is  diagonally  scaled,  the 
preconditioner  for  the  conjugate  gradient  algorithm  becomes  the 
identity  matrix.  This  simple  preconditioner  is  referred  to  here 
as  DSCG. 
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4.  Results 

Results  of  the  tests  are  presented  in  tables  I  and  II  of 
this  section.  Included  are:  timings,  iteration  parameters, 
iteration  counts,  and  megaflop  rate  for  the  iteration  loop  of 
each  algorithm.  In  table  II,  results  from  a  subset  of  the  prob- 
lems listed  in  table  I  which  have  been  modified  to  increase 
the  number  of  nodes  and  thereby  the  size  of  matrix  equation  to 
be  solved  are  given. 

The  categories  in  the  tables  are  relatively  self  explan- 
atory. By  set-up  time  is  meant  the  time  necessary  to  set  up  the 
matrix  equation  and  to  initiate  the  iteration  loop.  The  megaflop 
rate  was  found  by  dividing  the  total  number  of  operations  per- 
formed by  the  average  CPU  time  needed  to  complete  one  iteration 
loop.  Parenthetical  values  alongside  those  of  VICCG  are  results 
of  setting  its  iteration  parameter  to  zero.  This  produces  a  code 
which  mimics  van  der  Vorst's[3]  vectorized  version  of  ICCG. 

The  burden  of  vectorizing  each  of  the  algorithms  was 
carried  primarily  by  the  Cyber  205  vector  compiler  with  scalar 
optimizer.  Few  special  Q8  calls  are  used.  In  particular  they 
are  Q8MAX  and  Q8SDOT.  Q8MAX  is  used  by  each  of  the  schemes 
for  computing  the  inf-norm  of  the  residual  vector  for  conver- 
gence checking.  Q8SD0T  gives  the  scalar  dot  product  of  two 
vectors  and  is  used  solely  by  the  conjugate  gradient  codes. 
Special  "chaining"  of  do  loops  for  the  purpose  of  increasing 
the  number  of  linked  triadic  operations  was  not  done. 

Diagonal  scaling  of  the  A  matrix  was  performed  only  on 
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POLCG  and  DSCG.  To  find  the  scaled  residual  as  defined  by  (9) 
the  computed  residuals  are  first  multiplied  by  the  diagonal 
scaling  factors  to  find  "true"  values  of  the  residuals.  Scaling 
the  remaining  conjugate  gradient  algorithms  as  outlined  by 
Eisenstat [26]  would  not  reduce  the  number  of  calculations  in 
the  present  instance  because  the  same  number  of  operations 
saved  in  one  portion  of  the  algorithm  would  need  to  be  per- 
formed in  finding  the  "true"  residual  as  given  in  (9) . 

Two  of  the  codes  seem  to  perform  better  than  all  others 
in  table  I.  They  are  MICCG  and  POLCG,  which  have  the  fastest 
iteration  times  on  2  and  3  of  the  test  problems  respectively. 
SIP  performed  relatively  poorly  on  all  but  the  first  two  test 
cases.  (If  the  tolerance  level  for  convergence  changes  from 
10~8  to  10~3  SIP  does  dramatically  better-particularly 
on  problem  2-but  for  such  a  stopping  criterion  SIP  gives  in- 
correct results  for  problem  3.  Given  this  larger  tolerance  level, 
POLCG  and  SIP  performed  the  best  on  two  problems  each.)  DSCG  and 
PICCG  did  better  than  both  ICCG  and  VICCG  for  the  problems  in 
table  I  leading  one  to  believe  that  vectorizing  the  ICCG  code  in 
the  manner  of  van  der  Vorst  is  not  particularly  efficient  for 
small  three  dimensional  problems  because  the  vector  lengths  are 
not  long  enough.  The  situation  for  VICCG  is  worse  if  an  iteration 
parameter  is  not  used (as  it  is  for  MICCG)  to  improve  on  the  iter- 
ation count.  The  phenomenal  timings  of  SIP  and  MICCG  for  test 
problem  1  indicate  that  the  preconditioner (for  MICCG)  and  the  LU 
factorization (for  SIP)  are  nearly  exact  inverses  of  the  original 
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matrix.  (The  exact  answer  for  test  problem  one  was  a  constant 
head  value  throughout  the  domain.) 

Table  I  highlights  some  striking  differences  between  POLCG 
and  MICCG.  POLCG  has  nearly  three  times  the  megaflop  rate  of 
MICCG  while  MICCG  reguires  on  average  only  half  as  many  itera- 
tions. On  a  scalar  machine  POLCG  could  not  compete  with  MICCG 
because  it  reguires  more  iterations  and  actually  performs  more 
operations  in  its  inner  loop  than  MICCG.  The  vectorizability  of 
POLCG  allows  it  to  be  competitive  with  MICCG  on  a  Cyber  205. 

Continuing  the  comparison  between  POLCG  and  MICCG  it  is 
noted  that  for  "optimal"  convergence  in  MICCG  a  suitable  iter- 
ation parameter  must  be  found  while  for  POLCG  no  such  parameter 
is  needed.  Ashcroft  and  Grimes [9]  recommend  an  optimal  iter- 
ation parameter  of  just  less  than  unity  for  MICCG  based  upon 
empirical  evidence.  While  these  observations  appear  to  be  gen- 
erally true,  there  are  exceptions.  In  the  first  test  problem 
if  a  value  of  .999  is  used  rather  than  1.0  the  iteration  count 
increases  from  2  to  16.  While  problem  1  is  a  special  case,  in- 
stances occur  where  the  value  of  an  "optimal"  iteration  parameter 
is  rather  sharply  defined  (in  particular  when  modelling  aniso- 
tropic media) .  For  such  cases  several  runs  of  MICCG  may  be  re- 
quired to  ensure  that  a  given  choice  of  iteration  parameter  is 
not  far  from  optimal.  For  the  "larger"  problems  reported  in 
table  II,  it  was  found  that  the  iteration  count  varied  from  61 
to  42  to  51  as  the  iteration  parameter  for  MICCG  changed  from 
.95  to  .994  to  .9996  on  problem  2.  Similarly  on  problem  4  of 
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table  II,  altering  the  iteration  parameter  of  MICCG  from  .95  to 

.999  to  .9995  produced  iteration  counts  of  102,  63,  and  66 
respectively . 

The  neccesity  of  an  iteration  parameter  for  MICCG  may 
become  a  problem  if  the  matrix  equation  is  periodically  updated 
(as  is  done  when  modelling  groundwater  problems  for  which  water 
table  conditions  apply) .  If  revision  of  the  matrix  equation  is 
frequently  done  during  the  solution  of  a  time  dependent  problem 
one  would  expect  some  variation  in  the  value  of  the  "optimal" 
iteration  parameter.  To  find  its  value  at  each  update  would  be 
impractical,  while  using  the  same  value  throughout  the  itera- 
tions may  or  may  not  be  optimal  for  the  whole  problem.  The  only 
way  to  be  assured  of  its  "optimal ity"  is  to  run  a  series  of  tests 
on  the  complete  time  dependent  problem  for  a  variety  of  iter- 
ation parameters.  Analysis  of  a  single  time  step  is  not  suffic- 
ient. 

A  further  disadvantage  in  using  MICCG  rather  than  POLCG  on 
problems  requiring  frequent  updates  is  that  the  set-up  time  for 
MICCG  is  roughly  three  times  that  of  POLCG. 

An  observable  advantage  in  using  MICCG  over  all  of  the  other 
methods  has  been  previously  reported [9]  and  is  confirmed  in  the 
tests  reported  here.  Comparing  iteration  counts  in  problems  2 
and  4  of  table  I  with  those  in  table  II,  it  can  be  seen  that 
the  increase  for  MICCG  compared  to  that  for  all  of  the  other 
methods  is  much  less.  The  rate  at  which  the  number  of  iterations 
increases  as  a  function  of  N  (number  of  nodes  along  one  dimen- 
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sion  of  the  numerical  grid)  is  OfN'5)  for  MICCG  while  for 
POLCG  and  ICCG  the  increase  appears  to  be  0(N).  The  rate  for 
VICCG  falls  somewhere  in  between  depending  upon  the  test  problem. 

VICCG  improves  substantially  as  the  size  of  the  problem 
increases  as  can  be  seen  by  comparing  table  I  results  to  those  of 
table  II.  However  in  cases  of  strong  anisotropy,  as  is  present  in 
test  problem  4,  VICCG  has  some  definite  difficulty.  In  fact  the 
presence  of  an  iteration  parameter  in  the  algorithm  is  hardly 
warranted. 

In  conclusion,  it  would  appear  that  for  relatively  small 
domains  (on  the  order  of  a  few  thousand  unknowns)  POLCG  is  a  very 
good  algorithm  as  applied  to  the  solution  of  two  and  three  dim- 
ensional groundwater  flow  equations.  For  larger  problems  MICCG  is 
perhaps  a  better  choice  when  circumstances  are  such  that  frequent 
updates  of  the  matrix  equation  are  unnecessary.  As  the  number  of 
unknowns  continues  to  increase,  VICCG  is  an  attractive  alterna- 
tive to  MICCG  because  of  its  vectorizability .  However,  strongly 
anisotropic  conditions  seem  to  severely  increase  VICCG' s  iter- 
ation count  and  for  such  problems  MICCG  may  give  superior  per- 
formance. 
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Table  1 

Problem  1  (4913  nodes) 


Set  up 
Algorithm  time 


Iteration  Number  of    Iteration   Megaflop 
parameter   iterations    time       rate 


SIP 

.157        1.0 

2 

.007 

20 

ICCG 

.038       n/a 

19 

.107 

28 

POLCG 

.029       n/a 

13 

.039 

92 

VICCG 

.099(.099)  .9(0.0) 

19(21) 

.223(.252)  21 

MICCG 

.091        1.0 

2 

.001 

28 

DSCG 

.02  6       n/a 

39 

.045 

94 

PICCG 

.039        n/a 

21 

.050 

89 

Problem  2 

(4913  nodes) 

Set  up   Iteration 

Number  of 

Iteration 

Megaflop 

Algorithm 

time    parameter 

iterations 

time 

rate 

SIP 

.155        .9994 

31 

.206 

20 

ICCG 

.038        n/a 

44 

.278 

28 

POLCG 

.028       n/a 

55 

.183 

92 

VICCG 

.097(.097)  .98(0.0) 

30(44) 

.367(.556)  21 

MICCG 

.089         .975 

29 

.184 

28 

DSCG 

.026        n/a 

178 

.211 

94 

PICCG 

.036       n/a 

88 

.224 

89 

Problem  3 

(4913  nodes) 

Set  up   Iteration 

Number  of 

Iteration 

Megaflop 

Algorithm 

time    parameter 

iterations 

time 

rate 

SIP 

.147        .9974 

524 

3.585 

20 

ICCG 

.038       n/a 

45 

.278 

28 

POLCG 

.029        n/a 

39 

.128 

92 

VICCG 

.090(.090)  .92(0.0) 

40(48) 

.500(.609)  21 

MICCG 

.082        .95 

31 

.182 

28 

DSCG 

.026        n/a 

124 

.146 

94 

PICCG 

.037        n/a 

52 

.131 

89 

Problem  4 

(1922  nodes) 

Set  up    Iteration 

Number  of 

Iteration 

Megaflop 

Algorithm 

time    parameter 

iterations 

time 

rate 

SIP 

ICCG 

POLCG 

VICCG 

MICCG 

DSCG 

PICCG 


.042 

.017 

.014 

.043( 

.043 

.013 

.017 


043) 


.9994 

n/a 

n/a 

.4(0.0) 

.9953 

n/a 

n/a 


127 

72 

103 

114(116) 

43 

336 

177 


315 

166 

128 

271( 

101 

150 

167 


275) 


18 
26 
85 
26 
26 
90 
83 
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Table  1 (continued) 
Problem  5  (5202  nodes) 


Set  up 
Algorithm   time 


Iteration   Number  of    Iteration   Megaflop 
parameter   iterations    time       rate 


SIP 

.166         .9998 

189 

1.241        19 

ICCG 

.046       n/a 

58 

.373         25 

POLCG 

.038        n/a 

52 

.160         93 

VICCG 

.102(.102)  .9(0.0) 

47(59) 

.224(.282)  47 

MICCG 

.105        .95 

35 

.212         27 

DSCG 

.035        n/a 

162 

.186        94 

PICCG 

.046       n/a 

67 

.162        87 

Table  2 


Problem  2  (35937  nodes) 


Algorithm 


Set  up 
time 


Iteration 
parameter 


Number  of 
iterations 


Iteration 
time 


Megaflop 
rate 


SIP 

1.034       .9994 

169 

8.562 

19 

ICCG 

.207        n/a 

84 

3.995 

27 

POLCG 

.150        n/a 

139 

3.768 

84 

VICCG 

.570(.570)  .996(0.0] 

I    47(84) 

2.792  (5.082 

>)  33 

MICCG 

.566         .994 

42 

1.967 

27 

DSCG 

.136       n/a 

457 

4.306 

87 

PICCG 

.207       n/a 

290 

6.254 

78 

Problem  4 

(7442  nodes) 

Set  up   Iteration 

Number  of 

Iteration 

Megaflop 

Algorithm 

time    parameter 

iterations 

time 

rate 

SIP 

.251 

ICCG 

.745 

POLCG 

.065 

VICCG 

.154  ( 

MICCG 

.174 

DSCG 

.058 

PICCG 

.073 

154) 


.994 

n/a 

n/a 

.4(0.0) 

.9988 

n/a 

n/a 


223 

145 

193 

221(232) 

63 

640 

328 


2.120 

1.298 

.855 

444  (1 

.589 

1.056 

1.132 


510) 


19 
26 
93 
50 
26 
94 
88 
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